26 research outputs found

    Privacy-Aware Fuzzy Skyline Parking Recommendation Using Edge Traffic Facilities

    Get PDF

    Optimization techniques on job scheduling and resource allocation for MapReduce system

    No full text
    MapReduce has become a popular high performance computing paradigm for large-scale data processing. Hadoop, an open source implementation of MapReduce, has been widely deployed in large clusters containing thousands of machines by companies such as Yahoo! and Facebook to support batch processing for large jobs submitted from multiple users (i.e., MapReduce workloads). However, there are certainly a lot of room to improve the performance and fairness of Hadoop. In this thesis, we focus on optimization techniques on job scheduling and resource allocation to improve the performance and fairness of Hadoop system. First, we focus on the performance optimization for MapReduce workloads under FIFO scheduler without changing the source code of Hadoop by using job re-ordering approach. We consider two different kinds of production workloads, i.e., offline MapReduce workloads and online MapReduce workloads. The performance metrics used are makespan and total completion time. For offline workloads, we propose several job ordering algorithms. Based on the offline approaches, we further propose a prototype system called MROrder to optimize the performance for online MapReduce workloads. The experimental results show that our job ordering methods can significantly improve the performance of Hadoop for both offline and online workloads. Second, instead of keeping the default static MapReduce resource allocation model where the number of map slots and reduce slots are pre-configured and not fungible, we relax the model constrain to allow slots to be reallocated to either map or reduce tasks depending on their needs through modifying the source code of Hadoop. A dynamic fair resource allocation and scheduling system called DynamicMR is proposed and implemented in Hadoop. It can improve the performance of MapReduce workloads while ensuring the fairness without any information about MapReduce jobs. The experimental results validate the effectiveness of our DynamicMR. Moreover, it can also be applied for FIFO scheduler. Finally, besides the optimization for Hadoop MRv1, we also optimize the fair resource allocation for YARN (i.e., Hadoop MRv2). Specifically, we consider pay-as-you-use computing (e.g., cloud computing) and find that the traditional fair policy is not suitable for such computing system. To address this, we propose a Long-Term Resource Fairness (LTRF) and implement it in YARN by developing LTYARN, a long-term YARN fair scheduler. Our experimental results show that it leads to better resource fairness than existing fair scheduler. Thus, in this thesis, we have addressed some of the optimization problems on job scheduling and resource allocation for MapReduce system under different scenarios. We have proposed new algorithms and frameworks to improve the performance and fairness for Hadoop system. The proposed algorithms and frameworks will be options for users who want to optimize the performance of their MapReduce workloads or ensure fairness, according to their needs and conditions.DOCTOR OF PHILOSOPHY (SCE

    Gemini: An Adaptive Performance-Fairness Scheduler for Data-Intensive Cluster Computing

    No full text
    In data-intensive cluster computing platforms such as Hadoop YARN, performance and fairness are two important factors for system design and optimizations. Many previous studies are either for performance or for fairness solely, without considering the tradeoff between performance and fairness. Recent studies observe that there is a tradeoff between performance and fairness because of resource contention between users/jobs. However, their scheduling algorithms for bi-criteria optimization between performance and fairness are static, without considering the impact of different workload characteristics on the tradeoff between performance and fairness. In this paper, we propose an adaptive scheduler called Gemini for Hadoop YARN. We first develop a model with the regression approach to estimate the performance improvement and the fairness loss under the sharing computation compared to the exclusive non-sharing scenario. Next, we leverage the model to guide the resource allocation for pending tasks to optimize the performance of the cluster given the user-defined fairness level. Instead of using a static scheduling policy, Gemini adaptively decides the proper scheduling policy according to the current running workload. We implement Gemini in Hadoop YARN. Experimental results show that Gemini outperforms the state-of-the-art approach in two aspects. 1) For the same fairness loss, Gemini improves the performance by up to 225% and 200% in real deployment and the large-scale simulation, respectively, 2) For the same performance improvement, Gemini reduces the fairness loss up to 70% and 62.5% in real deployment and the large-scale simulation, respectively.MOE (Min. of Education, S’pore)Accepted versio

    Fair Resource Allocation for Data-Intensive Computing in the Cloud

    No full text
    To address the computing challenge of ’big data’, a number of data-intensive computing frameworks (e.g., MapReduce, Dryad, Storm and Spark) have emerged and become popular. YARN is a de facto resource management platform that enables these frameworks running together in a shared system. However, we observe that, in cloud computing environment, the fair resource allocation policy implemented in YARN is not suitable because of its memoryless resource allocation fashion leading to violations of a number of good properties in shared computing systems. This paper attempts to address these problems for YARN. Both singlelevel and hierarchical resource allocations are considered. For single-level resource allocation, we propose a novel fair resource allocation mechanism called Long-Term Resource Fairness (LTRF) for such computing. For hierarchical resource allocation, we propose Hierarchical Long-Term Resource Fairness (H-LTRF) by extending LTRF. We show that both LTRF and H-LTRF can address these fairness problems of current resource allocation policy and are thus suitable for cloud computing. Finally, we have developed LTYARN by implementing LTRF and H-LTRF in YARN, and our experiments show that it leads to a better resource fairness than existing fair schedulers of YARN.Accepted versio
    corecore